<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head>



<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="generator" content="http://www.movabletype.org/"><title>Well, I'm Back: Parallel DOM Access</title>



<link rel="stylesheet" href="parallel_dom_ac_files/styles-site.css" type="text/css">
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://weblogs.mozillazine.org/roc/index.rdf">
<link rel="alternate" type="application/atom+xml" title="Atom" href="http://weblogs.mozillazine.org/roc/atom.xml">

<link rel="start" href="http://weblogs.mozillazine.org/roc/" title="Home">
<link rel="prev" href="http://weblogs.mozillazine.org/roc/archives/2007/09/textalicious.html" title="Textalicious">

<link rel="next" href="http://weblogs.mozillazine.org/roc/archives/2007/10/triage.html" title="Triage">


<script type="text/javascript" language="javascript">
<!--

var HOST = 'weblogs.mozillazine.org';

// Copyright (c) 1996-1997 Athenia Associates.
// http://www.webreference.com/js/
// License is granted if and only if this entire
// copyright notice is included. By Tomer Shiran.

function setCookie (name, value, expires, path, domain, secure) {
    var curCookie = name + "=" + escape(value) + (expires ? "; expires=" + expires : "") + (path ? "; path=" + path : "") + (domain ? "; domain=" + domain : "") + (secure ? "secure" : "");
    document.cookie = curCookie;
}

function getCookie (name) {
    var prefix = name + '=';
    var c = document.cookie;
    var nullstring = '';
    var cookieStartIndex = c.indexOf(prefix);
    if (cookieStartIndex == -1)
        return nullstring;
    var cookieEndIndex = c.indexOf(";", cookieStartIndex + prefix.length);
    if (cookieEndIndex == -1)
        cookieEndIndex = c.length;
    return unescape(c.substring(cookieStartIndex + prefix.length, cookieEndIndex));
}

function deleteCookie (name, path, domain) {
    if (getCookie(name))
        document.cookie = name + "=" + ((path) ? "; path=" + path : "") + ((domain) ? "; domain=" + domain : "") + "; expires=Thu, 01-Jan-70 00:00:01 GMT";
}

function fixDate (date) {
    var base = new Date(0);
    var skew = base.getTime();
    if (skew > 0)
        date.setTime(date.getTime() - skew);
}

function rememberMe (f) {
    var now = new Date();
    fixDate(now);
    now.setTime(now.getTime() + 365 * 24 * 60 * 60 * 1000);
    if (f.author != undefined)
       setCookie('mtcmtauth', f.author.value, now, '/', HOST, '');
    if (f.email != undefined)
       setCookie('mtcmtmail', f.email.value, now, '/', HOST, '');
    if (f.url != undefined)
       setCookie('mtcmthome', f.url.value, now, '/', HOST, '');
}

function forgetMe (f) {
    deleteCookie('mtcmtmail', '', HOST);
    deleteCookie('mtcmthome', '', HOST);
    deleteCookie('mtcmtauth', '', HOST);
    f.email.value = '';
    f.author.value = '';
    f.url.value = '';
}

//-->
</script></head><body>

<div id="container">

<div id="banner">
<h1><a href="http://weblogs.mozillazine.org/roc/" accesskey="1">Well, I'm Back</a></h1>
<h2>Robert O'Callahan. Christian. Repatriate Kiwi. Mozilla hacker.</h2>
</div>

<div class="content">

<p align="right">
<a href="http://weblogs.mozillazine.org/roc/archives/2007/09/textalicious.html">« Textalicious</a> |

<a href="http://weblogs.mozillazine.org/roc/">Main</a>
| <a href="http://weblogs.mozillazine.org/roc/archives/2007/10/triage.html">Triage »</a>

</p>

<h2>September 25, 2007</h2>

<h3>Parallel DOM Access</h3>

<div class="columns"><p>Suppose we want to exploit parallelism to load and run an existing Web application faster. How might that be done?
</p><p>Here's a very high level diagram of the data flow inside the
browser. I believe it's reasonably accurate for Webkit as well as
Gecko.
</p><div style="text-align: center;"><iframe src="parallel_dom_ac_files/BrowserDataFlow.svg" style="border: 0pt none ; width: 285px; height: 315px;"></iframe></div>
<p>There are some missing edges in some cases; for example, the style
system can affect the DOM via XBL. I'm going to gloss over those for
now.
</p><p>Observe that the DOM is really the center of the universe. It's
a big blob of mutable state that everything else depends on. Any
parallelization of single-page applications is going to have to
carefully manage access to the DOM.
</p><p>For performance reasons, changes to the DOM are lazily
propagated to the style system, layout and rendering. The goal is to
batch changes so that these phases are not reexecuted too often ---
they're expensive, even when incremental. DOM updates make method calls
to registered listeners. Listeners record which styles might have been
affected and post a restyle event (if one hasn't already been queued)
to be processed later. Similarly, they set dirty bits in the frame tree
and then post a reflow event. On the other side, the parser batches its
appends to the DOM as well. It's primarily only script execution that
makes many fine-grained, interleaved reads and writes.
</p><p>[There are times when scripts need access to up-to-date style
and layout data. In Gecko, at those times we invoke
FlushPendingNotifications to force synchronous restyling and layout.
Because this is already a significant performance hit, Web developers
should already be trying to avoid triggering it (e.g., by accessing DOM
<tt>offset*</tt> properties).]
</p><p>I think we could leverage these access patterns to help parallelize browser execution:
</p><ul>
<li>Parsing can be mostly asynchronous; it should only need to
synchronize with the DOM when it's ready to append a bunch of elements.
</li><li>Styling, layout and rendering can take a <em>snapshot</em> of
the DOM and use it while the real DOM is being updated by other tasks.
It's OK for the display of a page to lag slightly behind; it already
does today. The snapshot will be refreshed each time we need to update
the window again, or when layout results must be return synchronously
to scripts.
</li><li>To run scripts, such as JS event handlers, in parallel, we
could use optimistic concurrency via transactions. The idea is that we
run a script speculatively against a snapshot of the DOM, recording
which parts of the DOM it reads and suppressing its changes to the
state of the DOM and other parts of the world. When the script has
completed, we synchronize with the real DOM and check whether anything
the script depended on has changed. If not, then we can commit the
script's global changes. Otherwise, its execution is invalid; we throw
out its changes and reexecute it all over again.
</li><li>Some DOM-accessing code cannot be executed speculatively,
such as plugins, or Javascript that takes actions that cannot be rolled
back, such as sending a message to a server. (At CMU, the canonical
example of an irrevocable message was "fire missile!") Therefore at any
point in time there can be (at most) one thread accessing the "real
DOM" which is not speculative and will never be asked to abort.
</li></ul>
<p>This model would be particularly useful for Firefox where extensions
written in Javascript have access to the DOMs of all browsed pages. We
could allow extensions to execute in parallel with client code as long
as at least one of them is speculative.
</p><p>None of this is easy to implement. Making real snapshots of the
DOM would be prohibitively expensive, so some kind of versioning or
copy-on-write scheme is required. Which state constitutes "the DOM"
that needs versioning is actually a little nebulous; for example, would
we version the state of <tt>&lt;canvas&gt;</tt> elements? How about
DOM Storage or the proposed DOM SQL API? However, the good news is that
some of this can be done incrementally. For example, we could start
with a limited slice of DOM state that is versioned, and decree that
speculative script accessing any other state is aborted, and then
gradually increase the scope of versioning.
</p><p>There's a lot more to think and talk about in this area.</p></div>

<a name="a018567more"></a>
<a id="more"></a>


<p class="posted">Posted by roc at September 25, 2007  4:19 PM</p>




<h2><a id="comments"></a>Comments</h2>

<a id="c1706278"></a>
<p>Roughly what percentage of time do the various blocks in your diagram take up today, in Gecko?</p>

<p>I'm trying to think what the best-case win for your parallelized
scheme is, and I'm wondering whether it's much of a win. It seems like
the biggest wins here come when one of the following is true:<br>
* Parsing takes a long (CPU) time, so we frequently have a
significantly updated DOM that we'd like to layout and render, if it
didn't block further parsing. Does this really happen? Seems like it
wouldn't be likely.<br>
* Script is doing lots of updates to the DOM for a long period of time.
I guess this could be the case if we have a lot of Greasemonkey scripts
(or extensions, as you mentioned), but I wonder how applicable it is in
general.</p>

<p>Even in these cases, the win is mainly that your page appears to
refresh slightly more quickly. It doesn't seem like in the common case
this makes your rendering noticeably faster, or makes script run lots
faster (unless you had lots of simultaneous readers of the DOM data...
maybe with Gears' multithreaded JS?). Am I totally missing your point?</p>

<p>I feel like a bigger win for Firefox would come from either finding
a way to pull UI event handling/updating out onto its own
thread/process, or at least making all the other pieces of the browser
enforce maximum running times on themselves before giving the message
loop another crack at things. That would take care of one of my biggest
annoyances with Firefox, namely the way that large pages which finally
finish loading or Flash videos which are doing lots of processing or
network access cause the browser's UI to hang. If the app was always
responsive it would go a long way to making it feel both faster and
more stable.</p>
<p class="posted">Posted by: Peter Kasting  at September 25, 2007  6:59 PM</p>
<a id="c1706964"></a>
<p>Snapshot DOM (SDOM) is exactly what I've been thinking lately,
though only to be used by style/layout/rendering (slr). To help that,
calls to synchronously flush layout should be reduced. Making sure that
FlushPendingNotifications (or whatever in other engines) is used only
when really needed might help performance even now, and if slr could be
moved to a separate thread, then even more.</p>

<p>One thing to think about is event handling. Events are targeted
based on layout objects etc. If layout is using SDOM, and DOM has been
changed, what should be done? Possibly resynchronize SDOM based on
version information in DOM. Hopefully synchronization doesn't happen
too often, like for every mousemove event. </p>

<p>Could the slr be split somehow to be run on several threads.
Rendering/painting using a snapshot of layout... mem usage would climb
if there was many kinds snapshots of the system...</p>
<p class="posted">Posted by: smaug  at September 25, 2007  8:56 PM</p>
<a id="c1707560"></a>
<p>Peter:<br>
&gt; Roughly what percentage of time do the various<br>
&gt; blocks in your diagram take up today, in Gecko?</p>

<p>Depends entirely on the page, but they're all significant.</p>

<p>&gt; * Parsing takes a long (CPU) time, so we<br>
&gt; frequently have a significantly updated DOM<br>
&gt; that we'd like to layout and render, if it<br>
&gt; didn't block further parsing. Does this really<br>
&gt; happen? Seems like it wouldn't be likely.</p>

<p>Parsing is often a significant amount of CPU time. It would be very
nice to be able to overlap it with inline script execution and with
incremental layout and rendering.</p>

<p>&gt; Script is doing lots of updates to the DOM for<br>
&gt; a long period of time. I guess this could be<br>
&gt; the case if we have a lot of Greasemonkey<br>
&gt; scripts (or extensions, as you mentioned), but<br>
&gt; I wonder how applicable it is in general.</p>

<p>Script-heavy apps are getting more and more common. And many are
already coded with lots of callbacks, handling AJAX responses, user
input, periodic timers, animation, etc. These don't all do "lots of
updates" to the DOM, but that's good...</p>

<p>&gt; Even in these cases, the win is mainly that<br>
&gt; your page appears to refresh slightly more<br>
&gt; quickly. It doesn't seem like in the common<br>
&gt; case this makes your rendering noticeably<br>
&gt; faster, or makes script run lots faster (unless<br>
&gt; you had lots of simultaneous readers of the DOM<br>
&gt; data... maybe with Gears' multithreaded JS?).<br>
&gt; Am I totally missing your point?</p>

<p>I'm hypothesizing that we can a) improve page load times and b) improve interactive response times.</p>

<p>Definitely, one of the first things to be done before stepping down
this path would be to test this hypothesis, probably by instrumenting
an engine to capture detailed application behaviour and then simulate a
parallelization scheme to estimate the benefit.</p>

<p>I'm also not suggesting this is the only way to exploit parallelism. This is just one way that might be valuable to us.</p>

<p>&gt; I feel like a bigger win for Firefox would come<br>
&gt; from either finding a way to pull UI event<br>
&gt; handling/updating out onto its own<br>
&gt; thread/process, or at least making all the<br>
&gt; other pieces of the browser enforce maximum<br>
&gt; running times on themselves before giving the<br>
&gt; message loop another crack at things.</p>

<p>I definitely agree. Your first suggestion is hard for us because of
the way XUL, JS and the extension system work (which is also one of our
strengths, so I don't think we should just walk away from it).
Supporting parallel DOM access would actually be a good step towards
supporting running browser UI concurrently with content script, without
breaking the current model.</p>

<p>Your second suggestion is also on the money. Two things that we
should do post-1.9 are "interruptible reflow" (detect pending events,
stop reflow, and then restart it after handling the events), and
putting (some) plugins in their own process(es). But these are separate
issues really, and much easier to solve.</p>

<p>smaug:<br>
&gt; Events are targeted based on layout objects<br>
&gt; etc. If layout is using SDOM, and DOM has been<br>
&gt; changed, what should be done?</p>

<p>We can actually tell whether an event is targeting a dirty frame or
not. I think in common cases we could get away without flushing layout
every time the mouse moves.</p>

<p>&gt; Could the slr be split somehow to be run on<br>
&gt; several threads.</p>

<p>Yep but they probably all want to use the same snapshot.</p>

<p>&gt; mem usage would climb if there was many kinds<br>
&gt; snapshots of the system...</p>

<p>Yes. It's imperative that you only store the differences between the snapshots, not entire copies of the DOM.</p>
<p class="posted">Posted by: Robert O'Callahan  at September 25, 2007 10:12 PM</p>
<a id="c1707794"></a>
<p>I would expect that the web2.0 eye candy stuff with animated pull
up, down, through, whatever would be the biggest challenge. And stuff
like finkle's bubbles demo.</p>

<p>All of them create content dynamically, partly even lots of it, and
then repeatedly modify the DOM to change the style and layout, and look
like a worst-case scenario here to me.</p>

<p>Being able to animate the style natively could give an perf improvement path to webauthors in this case, too.</p>
<p class="posted">Posted by: <a href="http://www.axel-hecht.de/blog/" rel="nofollow">Axel Hecht</a>  at September 25, 2007 11:38 PM</p>
<a id="c1708386"></a>
<p>Robert, I think you're going in the right direction. Here are some more thoughts on DOM snapshots from a while back:</p>

<p>http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2005-July/004304.html</p>
<p class="posted">Posted by: <a href="http://glazkov.com/" rel="nofollow">Dimitri Glazkov</a>  at September 26, 2007  1:57 AM</p>
<a id="c1708696"></a>
<p>Do you envision something like RCU as a way to snapshot the DOM? (http://www.rdrop.com/users/paulmck/RCU/whatisRCU.html)</p>

<p>At one point I looked at using RCU to make access to the XPCOM
component registry lock-free in the dominant cases, but it didn't make
it to the top of the stack.</p>
<p class="posted">Posted by: <a href="http://shaver.off.net/diary/" rel="nofollow">shaver</a>  at September 26, 2007  3:07 AM</p>
<a id="c1710675"></a>
<p>I don't think RCU is quite what is needed here.</p>
<p class="posted">Posted by: Robert O'Callahan  at September 26, 2007  9:01 AM</p>
<a id="c1710765"></a>
<p>"Snapshots" and "Copy on write" sound like you're thinking about implementing the DOM as a "Purely Functional Data Structure"?</p>
<p class="posted">Posted by: RichB  at September 26, 2007  9:32 AM</p>
<a id="c1711200"></a>
<p>A few years I tried implementing a thread aware browser and it was a
complete disaster. It is very difficult to get the locking right and
adding the locks everywhere slows everything down. I concluded that
trying to split a page up using fine grained parallelism wasn't a good
strategy. We did end up implementing image downloads in threads.</p>

<p>There are other schemes for increasing parallelism in a browser. For
example each page could be in it's own process with a process managing
the browser frame and tabs. Like MS OLE. This keeps pages from messing
with each other's performance.</p>

<p>Schemes using shared memory can work too. Things are split into
different processes to achieve parallelism and then communicate via
asynchronous messaging. Access to the data structures is serialized but
things like parsing or image decoding happen in the parallel processes
and send messages to update the DOM.</p>

<p>Lately we have been experimenting in git with fine grained
parallelism. So far it is always slower except when we can go parallel
for something that takes several seconds to complete.</p>
<p class="posted">Posted by: Jon Smirl  at September 26, 2007 11:12 AM</p>
<a id="c1719263"></a>
<p>What about using transparent wrappers for DOM method return values
(and out-of-scope JS values) to defer execution of results until
necessary? </p>

<p>The Javascript value returned from DOM methods (and potentially
passed to them) represents an opaque token. When the script reaches a
point where it needs to get the real value - an if statement, passing
the value to a single-threaded object or combining it with a value
independent of the shared state, it ends up being a synchronization
point where the code would wait for the correct value. </p>

<p>This could limit the amount of rollback for functions. The opaque
tokens could even be passed back into the DOM, where the results would
be evaluated when available.</p>

<p>I think the PyPy developers worked on something like this.<br>
</p>
<p class="posted">Posted by: Matt  at September 27, 2007  6:47 PM</p>
<a id="c1724301"></a>
<p>On the TV show "24" last year, the US government fired a missile in
a game of brinksmanship with another country but eventually redirected
it into an ocean. I'm not sure how realistic it is, but perhaps missile
launches aren't as irrevocable as they used to be.<br>
</p>
<p class="posted">Posted by: <a href="http://melez.com/mykzilla/" rel="nofollow">Myk Melez</a>  at September 28, 2007  8:04 AM</p>
<a id="c1725692"></a>
<p>You can abort the flight but you can't get the missile back into its launch tube :-)</p>
<p class="posted">Posted by: Robert O'Callahan  at September 28, 2007 10:58 AM</p>
<a id="c1784228"></a>
<p>The language you are using here makes me think of file systems... If
you view the DOM as a file system with elements/nodes as directories
and attributes as files (with content being attributes on anonymous
nodes), exactly like the DOM inspector does. Documents become 'volumes'
and are 'mounted' at various points.</p>

<p>Maybe the right thing to do is to think like this. Abandon C++
object thinking for a while and build the DOM as an in memory file
system, using a special memory allocator. Then layer the objects back
onto the 'files' as an abstraction.</p>

<p>In this case the active set of C++ objects could be a lot smaller
than the actual set of DOM nodes, because you could only have active
C++ objects for the nodes currently being accessed by methods.</p>

<p>And if you had some sort of 'mmap' support you could make the
construction of C++ nodes be very memory conservative for nodes with
lots of data (images, text).</p>

<p>This would allow you to leverage a lot of work on parallel access to
file systems, including locking, snapshots, copy-on-write,
transactions, caching, garbage collection, etc.</p>

<p>I'm not sure that the style system is at the right level in that
picture though? The style information is layered on top of the DOM,
like a unionfs in some ways... I guess this is what your dashed arrows
are trying to say. A unionfs would also allow you to have a layer on
top of a parsed HTML page which represented the active version of the
page, where all the JavaScript changes would go, and which would be
layered on the original DOM to provide the back/forward cache.</p>

<p>The unionfs idea could be used to do the transactions. Snapshot the
DOM, 'mount' a new unionfs layer on the snapshot, then try to fold it
back in after completion.</p>

<p>You could also use this 'memory file system' and the real mmap to
cache the DOM on disk rather than raw pages. It would give you a good
speedup, since you could avoid parsing the page.</p>

<p>One would have to do some careful thinking about what information
you would want in the file system. Do creation time and ownership
really have a real purpose in the DOM? The DOM also requires order
preserving between directory entries. You would also want to have very
small blocks, and forget about disk platters etc. But on the plus side,
there is a lot of free code, and you can sift through it looking for
good ideas. You could probably even prototype the system on current
memory file system like tmpfs, using it to develop the DOM C++ code,
and seeing what your access patterns looked like, and then using that
profiling info to build your special file system.</p>

<p>Maybe at a lower level, the problem here is that the browser is a
seen as an application, and not the kernel of a new type of OS.</p>

<p>OTOH, I probably don't know what I'm talking about.<br>
  -Jeremy<br>
</p>
<p class="posted">Posted by: <a href="http://people.freebsd.org/%7Ereg/" rel="nofollow">Jeremy Lea</a>  at October  9, 2007  8:19 AM</p>
<a id="c1932156"></a>
<p>Did you ever see the parrallel access all atomic functions hashtable on Google tech vids</p>

<p>Very light on processor time, n level versioning.</p>

<p>Could be viewed as a file system, but very simple and very very fast</p>
<p class="posted">Posted by: deepwalkercr__(4t)__gmale  at November  1, 2007  7:13 AM</p>





</div>
</div>

</body></html>